Parallel Inductive Logic in Data Mining

نویسنده

  • Yu Wang
چکیده

Data-mining is the process of automatic extraction of novel, useful and understandable patterns from very large databases. High-performance, scalable, and parallel computing algorithms are crucial in data mining as datasets grow inexorably in size and complexity. Inductive logic is a research area in the intersection of machine learning and logic programming, which has been recently applied to data mining. Inductive logic studies learning from examples, within the framework provided by clausal logic. It provides a uniform and very expressive means of representation: All examples, background knowledge as well as the induced theory are expressed in rst-order logic. However, such an expressive representation is often computationally expensive. This report rst presents the background for parallel data mining, the BSP model, and inductive logic programming. Based on the study, this report gives an approach to parallel inductive logic in data mining that solves the potential performance problem. Both parallel algorithm and cost analysis are provided. This approach is applied to a number of problems and it shows a super-linear speedup. To justify this analysis, I implemented a parallel version of a core ILP system { Progol { in C with the support of the BSP parallel model. Three test cases are provided and a double speedup phenomenon is observed on all these datasets and on two di erent parallel computers.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallel Inductive Logic for Data Mining

Data mining is the process of automatic extraction of novel, useful and understandable patterns in very large databases. High-performance, scalable, and parallel computing algorithms are crucial in data mining as datasets grow in size and complexity. Inductive logic is a research area in the intersection of machine learning and logic programming, which has been recently applied to data mining. ...

متن کامل

Inductive Logic Programming for Bioinformatics in Prova

This paper describes the inductive logic programming (ILP) features of Prova, a state-of-art distributed Semantic Web and Life Science inference service system and architecture for multi-relational data mining of complex Life Science phenomena such as complex biological relationships. The proposed novel design artifact implements typical ILP inference formalisms for rule-based generalization an...

متن کامل

Distributed Generative Data Mining

A process of Knowledge Discovery in Databases (KDD) involving large amounts of data requires a considerable amount of computational power. The process may be done on a dedicated and expensive machinery or, for some tasks, one can use distributed computing techniques on a network of affordable machines. In either approach it is usual the user to specify the workflow of the sub-tasks composing th...

متن کامل

IndLog - Induction in Logic

IndLog is a general purpose Prolog-based Inductive Logic Programming (ILP) system. It is theoretically based on the Mode Directed Inverse Entailment and has several distinguishing features that makes it adequate for a wide range of applications. To search efficiently through large hypothesis spaces, IndLog uses original features like lazy evaluation of examples and Language Level Search. IndLog...

متن کامل

An Inductive Logic Programming Query Language for Database Mining

First, a short introduction to inductive logic programming and machine learning is presented and then an inductive database mining query language RDM (Relational Database Mining language). RDM integrates concepts from inductive logic programming, constraint logic programming, deductive databases and meta-programming into a flexible environment for relational knowledge discovery in databases. Th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000